Skip to content

NV TensorRT RTX EP - initial commit #24456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Apr 24, 2025

Conversation

ankan-ban
Copy link
Contributor

@ankan-ban ankan-ban commented Apr 17, 2025

New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT.

Description

Adding a new EP based on TensorRT EP. This is going to use a special version of TensorRT optimized for RTX GPUs. In the future we plan to make changes to the EP to streamline it further (e.g, get rid of dependency on CUDA EP completely).

Motivation and Context

The new TensorRT for RTX is going to have:

  1. Much smaller footprint
  2. Much faster model compile/load times.
  3. Better usability in terms of use of cached models across multiple RTX GPUs.

This effort is also targeting WCR ML workflows.

ankan-ban and others added 2 commits April 17, 2025 17:40
New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT.
@ankan-ban ankan-ban marked this pull request as draft April 17, 2025 13:54
Unload the model once it is no longer needed.

Bug: 5225623
@jywu-msft
Copy link
Member

please address lintrunner failure
(clangformat) https://github.com/microsoft/onnxruntime/actions/runs/14519273916/job/40745967030?pr=24456

@jywu-msft
Copy link
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline, Windows ARM64 QNN CI Pipeline, Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@jywu-msft jywu-msft requested a review from chilo-ms April 23, 2025 04:17
@jywu-msft
Copy link
Member

@chilo-ms please review.

Adds some testing infrastructure and removes lots of deprecated options
@ankan-ban ankan-ban marked this pull request as ready for review April 23, 2025 04:34
@gedoensmax
Copy link
Contributor

@chilo-ms We will happily take any guidance on how to run more test using NV EP to find remaining bugs or implementation gaps.

@chilo-ms
Copy link
Contributor

chilo-ms commented Apr 23, 2025

@chilo-ms We will happily take any guidance on how to run more test using NV EP to find remaining bugs or implementation gaps.

We need to add a new pipeline/CI for building and testing this new NV EP which this PR hasn't done yet.
Considering this new TRT binary is not public yet, some questions here:

  • In terms of building the NV EP, just to double check, i assume those header files from public TRT, i.e. NvInferXXX.h, can be used. (update) setShapeValuesV2 seems to be a new API not yet in NvInferRuntime.h yet.

  • For running NV EP, our Windows pipeline uses A10, can NV EP run on A10? From what i heard it's limited to ampere+ gpus or 40xx/50xx gpus?

  • We might need to upload the private TRT binary to our internal blob storage and let new CI to fetch from there.

@chilo-ms
Copy link
Contributor

To add a new NV EP pipeline in GitHub Action, please duplicate from TRT EP's first and then modify accordingly.

@jywu-msft
Copy link
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

Copy link
Contributor

@chilo-ms chilo-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in ORT that makes this new NV EP known to ORT looks good to me.
As for EP code and test/validation is not the focus for this PR, we can discuss later.

@jywu-msft jywu-msft merged commit 2a09f27 into microsoft:main Apr 24, 2025
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants